[MUSIC]
This lecture is about the Latent Aspect
Rating Analysis for Opinion Mining and
Sentiment Analysis.
In this lecture,
we're going to continue discussing
Opinion Mining and Sentiment Analysis.
In particular, we're going to introduce
Latent Aspect Rating Analysis
which allows us to perform detailed
analysis of reviews with overall ratings.
So, first is motivation.
Here are two reviews that you often
see in the net about the hotel.
And you see some overall ratings.
In this case,
both reviewers have given five stars.
And, of course,
there are also reviews that are in text.
Now, if you just look at these reviews,
it's not very clear whether the hotel is
good for its location or for its service.
It's also unclear why
a reviewer liked this hotel.
What we want to do is to
decompose this overall rating into
ratings on different aspects such as
value, rooms, location, and service.
So, if we can decompose
the overall ratings,
the ratings on these different aspects,
then, we
can obtain a more detailed understanding
of the reviewer's opinionsabout the hotel.
And this would also allow us to rank
hotels along different dimensions
such as value or rooms.
But, in general, such detailed
understanding will reveal more information
about the user's preferences,
reviewer's preferences.
And also, we can understand better
how the reviewers view this
hotel from different perspectives.
Now, not only do we want to
infer these aspect ratings,
we also want to infer the aspect weights.
So, some reviewers may care more about
values as opposed to the service.
And that would be a case.
like what's shown on the left for
the weight distribution,
where you can see a lot of
weight is places on value.
But others care more for service.
And therefore, they might place
more weight on service than value.
The reason why this is
also important is because,
do you think about a five star on value,
it might still be very expensive if the
reviewer cares a lot about service, right?
For this kind of service,
this price is good, so
the reviewer might give it a five star.
But if a reviewer really cares
about the value of the hotel,
then the five star, most likely,
would mean really cheap prices.
So, in order to interpret the ratings
on different aspects accurately,
we also need to know these aspect weights.
When they're combined together,
we can have a more detailed
understanding of the opinion.
So the task here is to get these reviews
and their overall ratings as input,
and then,
generate both the aspect ratings,
the compose aspect ratings, and
the aspect rates as output.
And this is a problem called
Latent Aspect Rating Analysis.
So the task, in general,
is given a set of review articles about
the topic with overall ratings, and
we hope to generate three things.
One is the major aspects
commented on in the reviews.
Second is ratings on each aspect,
such as value and room service.
And third is the relative weights placed
on different aspects by the reviewers.
And this task has a lot of applications,
and if you can do this,
and it will enable a lot of applications.
I just listed some here.
And later, I will show you some results.
And, for example,
we can do opinion based entity ranking.
We can generate an aspect-level
opinion summary.
We can also analyze reviewers preferences,
compare them or
compare their preferences
on different hotels.
And we can do personalized
recommendations of products.
So, of course, the question is
how can we solve this problem?
Now, as in other cases of
these advanced topics,
we won’t have time to really
cover the technique in detail.
But I’m going to give you a brisk,
basic introduction to the technique
development for this problem.
So, first step, we’re going to talk about
how to solve the problem in two stages.
Later, we’re going to also mention that
we can do this in the unified model.
Now, take this review with
the overall rating as input.
What we want to do is, first,
we're going to segment the aspects.
So we're going to pick out what words
are talking about location, and
what words are talking
about room condition, etc.
So with this, we would be able
to obtain aspect segments.
In particular, we're going to
obtain the counts of all the words
in each segment, and
this is denoted by C sub I of W and D.
Now this can be done by using seed
words like location and room or
price to retrieve
the [INAUDIBLE] in the segments.
And then, from those segments,
we can further mine correlated
words with these seed words and
that would allow us to segmented
the text into segments,
discussing different aspects.
But, of course,
later, as we will see, we can also use
[INAUDIBLE] models to do the segmentation.
But anyway, that's the first stage,
where the obtain the council
of words in each segment.
In the second stage,
which is called Latent Rating Regression,
we're going to use these words and
their frequencies in different
aspects to predict the overall rate.
And this predicting happens in two stages.
In the first stage,
we're going to use the [INAUDIBLE] and
the weights of these words in each
aspect to predict the aspect rating.
So, for example, if in your discussion
of location, you see a word like,
amazing, mentioned many times,
and it has a high weight.
For example, here, 3.9.
Then, it will increase
the Aspect Rating for location.
But, another word like, far,
which is an acted weight,
if it's mentioned many times,
and it will decrease the rating.
So the aspect ratings, assume that it
will be a weighted combination of these
word frequencies where the weights
are the sentiment weights of the words.
Of course, these sentimental weights
might be different for different aspects.
So we have, for each aspect, a set of
term sentiment weights as shown here.
And that's in order by beta sub I and W.
In the second stage or second step,
we're going to assume that the overall
rating is simply a weighted
combination of these aspect ratings.
So we're going to assume we have aspect
weights to the [INAUDIBLE] sub i of d,
and this will be used to take a weighted
average of the aspect ratings,
which are denoted by r sub i of d.
And we're going to assume the overall
rating is simply a weighted
average of these aspect ratings.
So this set up allows us to predict
the overall rating based on
the observable frequencies.
So on the left side,
you will see all these observed
information, the r sub d and the count.
But on the right side,
you see all the information in
that range is actually latent.
So, we hope to discover that.
Now, this is a typical case of
a generating model where would embed
the interesting variables
in the generated model.
And then, we're going to set up
a generation probability for
the overall rating given
the observed words.
And then, of course, we can adjust these
parameter values including betas Rs and
alpha Is in order to maximize
the probability of the data.
In this case, the conditional probability
of the observed rating given the document.
So we have seen such cases before in, for
example, PISA,
where we predict a text data.
But here, we're predicting the rating,
and the parameters,
of course, are very different.
But we can see, if we can uncover
these parameters, it would be nice,
because r sub i of d is precise as
the ratings that we want to get.
And these are the composer
ratings on different aspects.
[INAUDIBLE] sub I D is precisely
the aspect weights that we
hope to get as a byproduct,
that we also get the beta factor, and
these are the [INAUDIBLE] factor,
the sentiment weights of words.
So more formally,
the data we are modeling here is a set of
review documents with overall ratings.
And each review document denote by a d,
and the overall ratings denote by r sub d.
And d pre-segments turn
into k aspect segments.
And we're going to use ci(w,d) to denote
the count of word w in aspect segment i.
Of course, it's zero if the word
doesn't occur in the segment.
Now, the model is going to
predict the rating based on d.
So, we're interested in the provisional
problem of r sub-d given d.
And this model is set up as follows.
So r sub-d is assumed the two
follow a normal distribution
doesn't mean that denotes
actually await the average
of the aspect of ratings r
Sub I of d as shown here.
This normal distribution is
a variance of data squared.
Now, of course,
this is just our assumption.
The actual rating is not necessarily
anything thing this way.
But as always, when we make this
assumption, we have a formal way to
model the problem and that allows us
to compute the interest in quantities.
In this case, the aspect ratings and
the aspect weights.
Now, the aspect rating as
you see on the [INAUDIBLE]
is assuming that will be
a weight of sum of these weights.
Where the weight is just
the [INAUDIBLE] of the weight.
So as I said,
the overall rating is assumed to be
a weighted average of aspect ratings.
Now, these other values, r for
sub I of D, or denoted together
by other vector that depends on D is
that the token of specific weights.
And we’re going to assume that
this vector itself is drawn
from another Multivariate Gaussian
distribution,
with mean denoted by a Mu factor,
and covariance metrics sigma here.
Now, so this means, when we generate our
overall rating, we're going to first draw
a set of other values from this
Multivariate Gaussian Prior distribution.
And once we get these other values,
we're going to use then the weighted
average of aspect ratings as
the mean here to use the normal
distribution to generate
the overall rating.
Now, the aspect rating, as I just said,
is the sum of the sentiment weights of
words in aspect, note that here the
sentiment weights are specific to aspect.
So, beta is indexed by i,
and that's for aspect.
And that gives us a way to model
different segment of a word.
This is neither because
the same word might have
positive sentiment for another aspect.
It's also used for see what parameters
we have here beta sub i and
w gives us the aspect-specific
sentiment of w.
So, obviously,
that's one of the important parameters.
But, in general, we can see we have these
parameters, beta values, the delta,
and the Mu, and sigma.
So, next, the question is, how can
we estimate these parameters and, so
we collectively denote all
the parameters by lambda here.
Now, we can, as usual,
use the maximum likelihood estimate, and
this will give us the settings
of these parameters,
that with a maximized observed ratings
condition of their respective reviews.
And of, course,
this would then give us all the useful
variables that we
are interested in computing.
So, more specifically, we can now,
once we estimate the parameters,
we can easily compute the aspect rating,
for aspect the i or sub i of d.
And that's simply to take all of the words
that occurred in the segment, i,
and then take their counts and
then multiply that by the center of
the weight of each word and take a sum.
So, of course, this time would be zero for
words that are not occurring in and
that's why were going to take the sum
of all the words in the vocabulary.
Now what about the s factor weights?
Alpha sub i of d, well,
it's not part of our parameter.
Right?
So we have to use that to compute it.
And in this case, we can use the Maximum
a Posteriori to compute this alpha value.
Basically, we're going to maximize the
product of the prior of alpha according
to our assumed Multivariate Gaussian
Distribution and the likelihood.
In this case,
the likelihood rate is the probability of
generating this observed overall rating
given this particular alpha value and
some other parameters, as you see here.
So for more details about this model,
you can read this paper cited here.
[MUSIC]

